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Abstract 

We propose to investigate test statistics for testing homogeneity in reproducing kernel 
Hilbcrt spaces. Asymptotic null distributions under null hypothesis are derived, and con- 
sistency under fixed and local alternatives is assessed. Finally, experimental evidence of 
the performance of the proposed approach on both artificial data and a speaker verification 
task is provided. 
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1. Introduction 

An important problem in statistics and machine learning consists in testing whether the 
distributions of two random variables are identical under the alternative that they may differ 
in some ways. More precisely, let {X^ , . . . ,Xn ± } and {X[ ',..., Xn 2 } be independent 
random variables taking values in an arbitrary input space X , with common distributions 
Pi and P2, respectively. The problem consists in testing the null hypothesis of homogeneity 
Ho : Pi = P2 , against the alternative H^ : Pi 7^ P?. This problem aris es in many 
applications, r anging from computa tional anatomy (JGrenander and Millerl . 120071 ) to speaker 



segmentation (IBimbot et all 120041 ). We shall allow the input space X to be quite general, 



including for example finite-dimensional euclidean spa ces but also function spaces, or more 
sophisticated structures such as strings or graphs (see IShawe-Tavlor and Cristianinil . 12004 ) 



arising in applications such as bioinformatics (see recently iBorgwardt et al.l . 120061 ) . 



Traditional approaches to this problem are based on cumulative distribution functions 
(cdf), and use a certain distance between the empirical cdf obtained from the two samples. 
Popular procedures are the two-sample Kolmogorov-Smirnov tests or the Cramer- Von Mises 



tests ( Lehmann and Romanol . 120051 ) . that have been frequently used to address these issues, 



at least for low-dimensional data. Although these tests are popular due to their simplicity, 
they are known to be insensitive to certain characteristics of the distributions, such as 
densities containing high-frequency components or local features such as narrow bumps. 
The low-power of the traditional cdf-based test statistics can be improved on by using test 
statistics based on pro bability density e stima tors. Tests b ased on kernel density estimators 
have been studied by lAnderson et al.l ( 1994 ) and lAllenl ( 19971 ). using respectively the L 2 



and L l distances between densities. More recently, the use of wavelet estimators has been 
proposed and thoroughly analyzed. Adaptive versions of these tests, that is where smoothing 
pa rameters for the density estim ator are obtained from the data, have been considered 
bv lButucea and TriboutevNgOOfih . 



Recently, iGretton et al.l ( 20061 ) cast the two-sample homogeneity test in a kernel-based 



framework, and have shown that their test statistics, coined Maximum Mean Discrep- 
ancy (MMD) yields as a particular case the L 2 -distance between kernel density estima- 
tors. We propose here to further enhance such an approach by directly incorporating 
the covariance structure of the probability distributions into our test statistics, yielding 
in some sense to a chi-square divergence between the two distributions. For discrete dis- 
tributi qns, it is well-known that su ch a normalization yield test statistics with greater 
power ( Lehmann and Romano! . 120051 ). 



The paper is organized as follows. In Section [2] and Section [3l we state the main 
definitions and we build our test statistics upon kernel Fisher discriminant analysis. In 
Section^ we give the asymptotic distribution of our test statistic under the null hypothesis, 
and establish the consistency and the limiting distribution of the test for both fixed and 
a class of local alternatives. In Section [5l we first investigate the limiting power of our 
test statistics against directional then non-directional sequences of local alternatives in a 
particular setting, that is when Pi is the uniform distribution and P2 is a one-frequency 
contamination of Pi on the Fourier basis and the reproducing kernel belongs to the class of 
periodic spline kernels, and then compare our test statistics with the MMD test statistics in 
terms of limiting power. In Section [6] we provide experimental evidence of the performance 
of our test statistic on a speaker identification task. Detailed proofs are presented in the 
last sections. 

2. Mean and covariance in reproducing kernel Hilbert spaces 

We first highlight the main assumptions on the reproducing kernel, and then introduce 
operator-theoretic tools for defining the mean element and the covariance operator associ- 
ated with a reproducing kernel. 

2.1 Reproducing kernel Hilbert spaces 

Let (X, d) be a separable measurable metric space, and denote by X the associated a-algebra. 
Let X be A'-valued random variable, with probability measure P, and the expectation with 
respect to P is denoted by E. Consider a Hilbert space (H, (•, -) w ) of functions from X to 
R. The Hilbert space TC is a reproducing kernel Hilbert space (RKHS) if at each x € X, 
the point evaluation operator 5 X : TC — > M, which maps / 6 TC to f{x) G K, is a bounded 
linear functional. To each point x £ X, there corresponds an element $(x) G TC such that 
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($(x),f) n = f(x) for all / € TL, and ($( x),®(y)) n = k(x,y), where k : X x X ->■ E is 
a positive definite kernel ( Aronszainl . Il95fj ). In this situation, $(•) is the Aronszajn-map, 

1/2 

and we denote by ||/|| w = (/,/)•« the associated norm. It is assumed from now on that 



W is a separable Hilbert space. Note that this is always the case if X is a separable metric 
space and if the kernel is continuous (see ISteinwart et all J2006al ) . We make the following 
two assumptions on the kernel: 



(Al) The kernel k is bounded, that is 

(A2) For all probability distributions F 
in L 2 (¥). 



clef 



BVP(x,y)eXxxK x >y) < °°- 



on (X,X), the RKHS associated with k(-, ■) is dense 



Note that some of our results (such as the limiting distribution under the null distribu- 
tion) are valid without assumption (.A|2j), while consistency results against fixed or local 
alternatives do need (A[2]). Assumpt i on (A[2]) is true in particular for the gaussian kernel 



on 



space (jSteinwart et all l2006bl . Corollary 3) 



m 



> shown in kelnwart et all EoTg Theorem 2), and that X maj be a discrete 



2.2 Mean element and covariance operator 

We shall need some operator-theoretic tools (see lAubinl . 12000 ) , to define mean elements 
and covariance operators. A linear operator T is said to be bounded if there is a number C 
such that ||T/|| W < C ||/|| w for all / G TL. The operator-norm of T is then defined as the 
minimum of such numbers C, that is ||T|| = supim <1 ||7Y|| W . Furthermore, a bounded 
linear operator T is said to be Hilbert-Schmidt, if the Hilbert-Schmidt-norm ||T|| HS = 
{J2^Li (Te p ,Te p )y} l > 2 is finite, where {e p } p >i is any complete orthonormal basis of 7i. 
Note that ||T|| HS is independent of the choice of the orthonormal basis. We shall make 
frequent use of tensor product notations. The tensor product operator u ® v for u, v £ TL is 
defined for all / G TL as (u (£> v)f = (v, f) n u. 

We now introduce the mean element and covariance operator (see iBlanchard et al. 
20071 ). If f A; 1 ' 2 (x,x)P(dx) < oo, the mean element //p is defined as the unique element in 



TL satisfying for all functions f € TL, 



n 



"I 



<lrf 



fc 



(1) 



If furthermore J k(x, x)F(dx) < oo, then the covariance operator Sp is defined as the unique 
linear operator onto TL satisfying for all /, g £ TL, 



(f, £p#> 



del' 



H 



(f-Wf)(g-Fg)dF 



(2) 



that is (/, Xpg) w is the covariance between f(X) and g(X) where X is distributed according 
to P. Note that the mean element and covariance operator are well-defined when (A[T]) is 
satisfied. Moreover, when assumption (A[2]) is satisfied, then the map from P i— > /xp is 
injective. Note also that the operator Sp is a self-adjoint nonnegative trace-class operator. 
In the sequel, the dependence of /ip and Sp in P is omitted whenever there is no risk of 
confusion. 



We now define what we later denote by E ' 2 in our proofs. For a compact operator 
E, the range ^(S 1 / 2 ) of S 1 / 2 is defined as ^(S 1 / 2 ) = {E 1 / 2 /, / G ft}, and may be char- 
acterized by W(SV2) = {f eH , E^=i A p (/> e p)w < °°. / J- -^(E 1 / 2 )}, where {A p ,e p } p >i 
are the nonzero eigenvalues and eigenvectors of E, and A/"(E) = {f £ TC, E/ = 0} is 
the null-space of E, that is functions which are constant in the support of P. Defin- 
ing K-\Y}I 2 ) = {g e H, g = £~ x Ap 1/2 (/, e p ) H e p , f G ^(E 1 / 2 )}, we observe that 
E 1 ' 2 is a one-to-one mapping between 7?. _1 (E 1 ' 2 ) and Tv^E 1 ' 2 ). Thus, restricting the do- 
main of E 1 ' 2 to 7?. _1 (S 1 ' 2 ), we may define its inverse for all / £ Tv^E 1 ' 2 ) as E" 1 ' 2 / = 

YlpLi ^p kf i e v)-H e v ^ 1 ^ ie nu ^ s P ace ma y be reduced to the null element (in particular 
for the gaussian kernel), or may be infinite-dimensional. Similarly, there may be infinitely 
many strictly positive eigenvalues (true nonparametric case) or finitely many (underlying 
finite-dimensional problems) . 

Given a sample {X\, . . . , X n }, the empirical estimates respectively of the mean element 
and the covariance operator are then defined as follows: 



n 
* def -i 



ii = n- l J2HXi,-), (3) 

i=\ 

n 

t = n- 1 ^^,-)®/^,-)-/^. (4) 

By the reproducing property, they lead, on the one hand, to empirical means as from ([3]) we 
have (/}, /) = n _1 X^ILi f(-^-i) f° r an / *= ^> ana - on ^ ne other hand, to empirical covariances 
as from © we have (/, tg) H = n" 1 £f =1 f(Xi)g(Xi) - {n" 1 £? =1 /MM" -1 ELi s(X*)} 
for all f,geH. 

3. KFDA-based test statistic 

Our two-sample homogeneity test can be formulated as follows. Let {X± , . . . , Xn± } and 

(2) (2) 

{X| , ... ,Xn 2 } two independent identically distributed samples (iid) respectively from Pi 
and P2, having mean and covariance operators given by (//1, Ei) and (//2, E2). We build our 
test statistics using a (regularized) kernelized version of the Fisher discriminant analysis. 

Denote by Ety = (ni/n)Ei + (n2/n)E2 the pooled covariance operator, where n = n± + ri2, 
corresponding to t he within-class covariance matrix in the finite-dimensional setting (see 



Hastie et all 12001 



3.1 Maximum Kernel Fisher Discriminant Ratio 

Let us denote Eg = (nin2/ro 2 )(//2 — Ml) ® (M2 — Mi) the between-class covariance operator. 
For a = 1,2, denote by (/t a ,E a ) respectively the empirical estimates of the mean element 

and the covariance operator, defined as previously stated in ([3]) and (|4|). Denote Ejy = 
(rii/n)Ei + (n2/n)E2 the empirical pooled covariance estimator, and E# = (nin2/n 2 )(/t2 — 
Mi) ® (A2 — Ai) the empirical between-class covariance operator. Let {7 n }n>o be a sequence 
of strictly positive numbers. The maximum kernel Fisher discriminant ratio serves as a 
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basis of our test statistics: 



n max 



H 



f^(f,(E w + ln I)f 



nire 2 /n (% + 7rJ) 1/2 (^2 



Mi, 



(5) 



w 



where I denotes the identity operator. Note that if the input space is Euclidean, e.g., 
X = M. d , the kernel is linear k(x, y) = x T y and y n = 0, this quantity matches the so-called 
Hotelling's T 2 -statistic in the two-sample case ( Lehmann and Romand . 120051 ) . 
We shall make the following assumptions respectively on Si and E2 

(Bl) For u = 1,2, the eigenvalues {A p (E u )} p >i satisfy Yl^Li ^p (^«) < °°- 

(B2) For u = 1,2, there are infinitely many strictly positive eigenvalues {A p (E M )} p >i of E u . 

The statistical analysis conducted in Section 0] shall demonstrate, in the case j n — > 0, 
the need to respectively recenter and rescale (a standard statistical transformation known as 
studentization) the maximum Fisher discriminant ratio, in order to get a theoretically well- 
grounded test statistic. These roles, recentering and rescaling, will be played respectively by 
di(£w, 7) an d d,2(T,w, 7), where for a given compact operator E with decreasing eigenvalues 
X p , the quantity d r (E,7) is defined for all q > 1 as 



l/r 



d r (E,7) 



def 



P =i 



(6) 



3.2 Computation of the test statistics 

In practice the test statistics may be computed tha nks to the kernel trick, adap t ed to 
the kernel Fisher discriminant analysis as outlined in ( Shawe- Taylor and Cristianinil . 12004 . 
Chapter 6). Let us consider the two samples {X[ , ... , X^ } and {X^ , . . . ,Xn 2 }, with 



n i + n 2 = u. Denote by G 



a vector ex 



(u) 



Q 



(u) 

1 1 



,a 



(«) . ^ 

(«)lT 



This operator may be presented in a matrix form 



TLu 1— > H., u = 1, 2, the linear operators which associates to 
the vector in H given by G^W") = E^af^X.^, ')• 



We denote by G n 

Gram matrix given by K}," (i,j) 









(u) 



*(*r,o 



We denote by K 



(u,«) 



(7) 



[G 



WlT/~iO) 



G^, u,u e {0,1}, the 



k(X\ a> ,X) u >) for » E {l,...,n u }, j G {l,...,^}. 
Define, for any integer £, P^ = l£ — £~ 1 lilj where \g is the (£ x 1) vector whose components 
are all equal to one and 1^ is the {£ x £) identity matrix and let N n be given by 



N, 



def 



■ m 




"■> 



Finally, define the vector m n = (m n ^)i<£< n with m 

-l G (n)p pT ( G (u)\T 



-n-. 



for i 



(8) 
, n\ and 



n 2 for i = m + 1, . . . , n\ + 712- With the notations introduced above, 



fJ-2 — Ai — G n m n , E r 



n» 



1,2 



E 



11 



n 



L G n N n N^G 



T 

n ■ 



which implies that 

# 2 - £1, {± w + ll)' 1 ^ - M) = n^Gj(n _1 G n NXG^ + 7ir 1 G„m n . 

Then, using the matrix inversion lemma, we get 

m^GKn^GnNnN^Gl + 7 i)- 1 G n m„ 

= 7 _1 m^G^ {I - n- 1 G n N n ( 7 I + n- 1 N^Gj[G n N n )- 1 ]<G^} G n m n 
= 7" 1 {m^K n m n - n- 1 m^K n N n ( 7 I + n^N^N^^N^m,,} . 

Hence, the maximum kernel Fisher discriminant ratio may be computed from 

2 

nin 2 /n (E w + j n l)~ 1/2 (fi 2 - £1) 

= mn 2 /jn {m^K n m n -n" 1 m^K n N n (7l + n~ 1 N n K n N n )~ 1 N n K n m n } . 

4. Main results 

This discussion yields the following normalized test statistics: 

» 2 

nin 2 /n (E w + ^I)" 1 / 2 ^ -di(S^,7n) 

Tn(7n) = / f — , S . ( 9 ) 

In this paper, we first consider the asymptotic behavior of T n under the null hypothesis, 
and against a fixed alternative. This will establish that our nonparametric test procedure is 
consistent. However, this is not enough, as it can be arbitrarily slow. We thus then consider 
local alternatives. 

For all our results, we consider two situations regarding the regularization parameter j n ; 
(a) a situation where 7 n is held fixed, and in which the limiting distribution is somewhat 
similar to the maximum mean discrepancy test statistics, and (b) a situation where 7 n tends 
to zero slowly enough, and in which we obtain qualitatively different results. 

4.1 Limiting distribution under null hypothesis 

Throughout this paper, we assume that the proportions n\/n and n 2 /n converge to strictly 
positive numbers, that is 

n u /n — ► p u , as n = n\ + n 2 — ► oo , with p u > for u = 1, 2 . 

In this section, we derive the distribution of the test statistics under the null hypothesis 
Ho : Pi = P2 of homogeneity, which implies /ii = p 2 and Si = S2 = Xiy. We first 
consider the case where the regularization factor is held constant j n = 7. We denote — > 
the convergence in distribution. 
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Theorem 1. Assume (AU\-SU^. Assume in addition that the probability distributions Pi 
and ¥2 are equal, i.e. Pi = P2 = P, and that 7„ = 7 > 0. Then, 

00 
r„( 7 ) ^T 0O (S M /, 7 ) d =2- 1 / 2 ( i 2 " 1 (S H ,, 7 )^(A p (S w )+7)~ 1 A P (S l y)(Z p 2 -l) , (10) 

p=\ 

where {\ p (E<w)}p>i are the eigenvalues of the covariance operator S^, and ^2(^1^,7) is 
defined in ^), and Z p , p > 1 are independent standard normal variables. 

If the number of non-vanishing eigenvalues is equal to p and if 7 = 0, then the lim- 
iting distribution coincides with the limiting distribution of the Hotelling T 2 for compar- 
isons of two jp-dimensional ve c tors (which is a central chi-square with p degrees of free- 
dom ( Lehmann and Romano! . 120051 ). The previous result is similar to what is obtained 



bv lGretton et al.l ( 20061 ) for the Maximum Mean Discrepancy test statistics (MMD), we ob- 



tain a weighted sum of chi-squared distributions with summable weights. For a given level 
a E [0,1], denote by t\- a (T'Wil) the (1 — a)-quantile of the distribution of T 00 (E^-,7). 
Then, the sequence of test T n (j) > ti- a {^w-,l)i ls pointwise asymptotically level a to 
test homogeneity. Because in practice the covariance £jy is unknown, it is not possible 
to compute the quantile ti- a (Ew,7)- Nevertheless, this quantile can still be consistently 
estimated by ti- a (^w > 7) > which can be obtained from the sample covariance matrix (see 
Proposition [24 



Corollary 2. The testT n (-y) > ti- a (Y^w,l) is pointwise asymptotically level a. 

In practice, the quantile U- n (pw, 7) can be n umerically computed by inverse Laplace 
transform (see IStrawdermanl . 12004 ; iHughettl . Il998l ) . 



For all 7 > 0, the weights {(A p + 7) _1 A p } p >i are summable. However, if Assump- 
tion (E(2|) is satisfied, both <ii,ra(7, £w) and di^TjEw) tend to infinity when n — > 0. The 
following theorem shows that if 7 n tends to zero slowly enough, then our test statistics is 
asymptotically normal: 

Theorem 3. Assume (AHty, (BJ\-B$). Assume in addition that the probability distributions 
Pi and P2 are equal, i.e. Pi = P2 = P and that the sequence {7^} is such that 

7„, + d 2 " 1 (S V y,7 n )<ii(S W /,7n)7n ln ~ 1/2 ""> ° , 

then 

f n ( 7n )-^jV(0,l). 

The proof of the theorem is postponed to Section [H Under the assumptions of Theo- 
rem El the sequence of tests that rejects the null hypothesis when T n {^ n ) > z\- a , where 
Zi—a is the (1 — a)-quantile of the standard normal distribution, is asymptotically level a. 

Contrary to the case where 7„ = 7, the limiting distribution does not depend on the 
reproducing kernel, nor on the sequence of regularization parameters { 7 } n >i- However, 
notice that d^ (J^w,ln}di{J^w iln)ln 1 ' n ~ 1 — * requires that { 7 } n >i goes to zero at a 
slower rate than n" 1 ' 2 . For instance, if the eigenvalues {A p } p >i decrease at a polynomial 
rate, that is if there exists s > such that we have A p = p~ s for allp > 1, then, by Lemma [20l 



we have di(£iy>7n) ~ jn and d,2{^w,ln) ~ In as n — > oo. Therefore, the condition 
d,2 (E^/,7 n )(ii(5]iy,7n)7n l n~ 1 ' 2 — ► entails in this particular case that 7" 1 = o(n 2s ' 1+4s ), 
where the rate of decay s of the eigenvalues of the covariance operator Y,w, depends both 
on the kernel and the underlying distribution Pi = P2 = P. Besides, it may seem surprising 
that the limiting distribution is normal. This is due to two facts. First, we regularize 
the sample covariance operator prior to inversion (being of finite rank, the inverse of £ 
is obviously not defined). Second, the problem is here truly infinite dimensional, because 
we have assumed that the ei genvalues are infinite dimensional \ p (Y,\y) > for all p (see 
Lehmann and Romand . 120051 . Theorem 14.4.2, for a related result). 



4.2 Limiting behavior against fixed alternatives 

We study the power of the test based on T n (j n ) under alternative hypotheses. The minimal 
requirement is to prove that this sequence of tests is consistent. A sequence of tests of 
constant level a is said to be consistent in power if the probability of accepting the null 
hypothesis of homogeneity goes to zero as the sample size goes to infinity under a fixed 
alternative. Recall that two probability Pi and P2 defined on a measurable space (X,X) 
are called singular if there exist two disjoint sets A and B in X whose union is X such that 
Pi is zero on all measurable subsets of B while P2 is zero on all measurable subsets of B. 
This is denoted by Pi _L P 2 . 

When 7 n = 7 or when 7 n — ► 0, and Pi and P2 are not singular, then the following 
proposition shows that the limits in both ca s es are finite, strictly positive and independent 
of the kernel otherwise (see iFukumizu et all 120081 . for similar results for canonical correla- 



— 1/2 

tion analysis). The following result gives some useful insights on ||S W (/X2 — Mi)IIwj the 
population counterpart of \\(£>w + 7nl) _1 (A2 — Ai)l|w on which our test statistics is based 
upon. 

Proposition 4. Assume (AU1-AW- Let v a measure dominating Pi and P2, and let pi and 

— 1/2 

P2 the densities 0/P1 and P2 with respect to v. The norm ||S W (a*2 — A*i)||w ^ s infinite if and 

— 1/2 

only iff 1 andP2 o,re mutually singular. //Pi and P 2 are nonsingular, \\^ w (^2 — A*i)||w 
is finite and is given by 

^%2 - Mx) 2 = — ( 1 - / P T dv) ( f PlP2 du 
n p\P2\ J P1P1+P2P2 J \J P1P1 + P2P2 



It is equal to zero if the x 2 -divergence is null, that is, if and only if] 



2- 



By combining the two previous propositions, we therefore obtain the following consis- 
tency theorem: 

Theorem 5. Assume (AU\-AW- Let¥\ and ¥2 be two distributions over (X,X), such that 
P 2 / Pi. If either -y n = 7 or -y n + d% 1 (Ei,7 n )di(Si,7 n )7~ 1 n -1 / 2 ->• 0, then for any t > 

P HA (Tn(l)>t)^l. (11) 
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4.3 Limiting distribution against local alternatives 

When the alternative is fixed, any sensible test procedure will have a power that tends to one 
as the sample size n tends to infinity. This property is not suitable for comparing the limiting 
power of different test procedures. Several approaches are possible to answer this question - 



One such approach is to consider sequences of local alternatives (JLehmann and Romano 



20051 ) . Such alternatives tend to the null hypothesis as n — > oo at a rate which is such that 
the limiting distribution of sequence the test statistics under the sequence of alternatives 
converge to a non-degenerate random variable. To compare two sequences of tests for a 
given sequence of alternatives, one may then compute the ratio of the limiting powers, and 
choose the test which has the largest power. 

In our setting, let Pi denote a fixed probability on (X, 3£) and let PJ? be a sequence of 
probability on (X, X). The sequence P?? depends on the sample size n and converge to Pi as 
n goes to infinity with respect to a certain distance. In the asymptotic analysis of our test 
statistics against sequences of local alternatives, the x 2 -divergence D x 2 (Pi || Pg) is defined 
for all n as 



def 
r)n = 



dW% 



r/I 



1 



(12) 
l 2 (Pi) 



for Pg absolutely continuous with respect to Pi. Therefore, in the subsequent sections, we 
shall make the following assumption: 

(C) For any n, PJ? is absolutely continuous with respect to Pi, and D x 2 (Pi || PJ?) — ► as 
n tends to infinity. 

The following theorem shows that under local alternatives, we get a series of shift in the 
chi-squared distributions when 7 n = 7: 

Theorem 6. Assume (AU^, (S[l^, and (C). Assume in addition j n = 7 > and that 
nrfo = O(l), then 

00 
f n ( 7 ) -^ 2- 1 /2 d -l (Sl77) ^ (Ap(Sl)+7) -l Ap( y ]l){(Zp + an)p(7)) 2 _ 1} 

P =l 



with 



anAl) = {nmi/n) 1 ' 2 ((Si + 7 I)~ 1/2 (^2 " Mi), e P )^ , (13) 



where {Z p } p >i are independent standard normal random variables, defined on a common 
probability space. 

When the sequence of regularization parameters {7 n }n>i tends to zero at a slower rate 
than n -1 ' 2 , the test statistics is shown to be asymptotically normal, with the same limiting 
variance as the one under the null hypothesis, but with a non-zero limiting mean, as detailed 
in the next two results. While the former states the asymptotic normality under general 
conditions, the latter highlights the fact that the asymptotic mean-shift in the limiting 
distribution may be conveniently expressed from the limiting x 2 -divergence of Pi and P2 
under additional smoothness assumptions on the spectrum of the covariance operator. 

9 



Theorem 7. Assume f^QP, and (S[l]^), and (C). Let {7 n } n >i be a sequence such that 

7n + d- 1 (£l,7nK(£l,7„)7n 1 ™~ 1/2 ^0 (14) 

d2 l {^i,ln)nrj 2 n = 0{l) and dJ^S^T^di^i^n)^ -► , (15) 



where {ij n } n >i is defined in Iil2\) . If the following limit exists, 

*2-mi)| i2 

d 2 (Sl,7n 

then, 



A def lim n||(S 1 + 7w /)-V^- m )„ w 



T„(7n)^AA(p lP2 A,l). 
Corollary 8. Under the assumptions of Theorem^ if there exists a > smc/i £/ia£ 

(/# - Ml, Sr 1_ °(M2 - A*i)) w < oo , 
and i/ i/te following limit exists, 

A = lim d 2 (Si,7 n ) _1 nr/^ , 

n— »oo 

£/ien, 

TM^N( PlP2 A,l). 

It is worthwhile to note that pi/^A, the limiting mean-shift of our test statistics against 
sequences of local alternatives does not depend on the choice of the reproducing kernel. This 
means that, at least in the large-sample setting n — ► oo, the choice of the kernel is irrelevant, 
provided that for some a > we have (/i^ — Mi) Sj~ ~ a {^2 ~ Mi))^ < °°- Then, we get that 
the sequences of local alternatives converge to the null at rate rj n = C d 2 (£i,7 n )n -1 ' 2 
for some constant C > 0, which is slower than the usual parametric rate n -1 ' 2 since 
d2(Si,7 n ) — ► oo as n —> oo as shown in Lemma [THJ Note also that conditions of the 
form ^2 — A*i, ^i ~° L {l 1 2 — A*i))-h < °° imply that the sequence of local alternatives are 
limited to smooth enough densities p 1 ^ around p\. 

5. Discussion 

We illustrate now the behaviour of the limiting power of our test statistics against two 
different types of sequences of local alternatives. Then, we compare the power of our test 
st atistics against the p ower of the Maximum Mean Discrepancy test statistics proposed 
bv lGretton et al.1 ( 20061 ) . Finally, we highlights some links between testing for homogeneity 



and supervised binary classification. 

5.1 Limiting power against local alternatives of KFDA 

We have seen that our test statistics is consistent in power against fixed alternatives, for 
both regularization schemes j n = 7 and 7 n — ► 0. We shall now examine the behaviour of 
the power of our test statistics, against different types of sequences of local alternatives: 
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i) directional alternatives, ii) non-directional alternatives. For this purpose, we consider 
a specific reproducing kernel, the periodic spline kernel, whose derivation is given below. 
Indeed, when Pi is the uniform distribution on [0, 1], and dP2/dPi = 1 + i]c q with c q is a 
one-component contamination on the Fourier basis, we may conveniently compute a closed- 
form equivalent when n — > oo of the eigenvalues of the covariance operator Si, and therefore 
the power function of the test statistics. 



Periodic spline kernel The periodic spline kernel, described in ( Wahbal . ll99CJ . Chapt 



_ ter 

2), is defined as follows. Any function / in L 2 (X), where X is taken as the torus IR/2-7rZ, 
may be expressed in the form of a Fourier series expansion f(t) = Y^=Q a v c p(f) where 
Ys^Lo a p' an d f° r all £ > 1 

co(t) = l x (17) 

c 2 i-i(t) = V2 sin(2vr(2^ - l)t) (18) 

c u (t) = V2 cos(2tt(2^ - l)t) . (19) 

Let us consider the family of RKHS defined by H m = {/ : / G L 2 (X), £pl V ^ < oo} 
with ?n > 1, where X p = (27rp)~ 2m for all p > 1, whose norm is defined for all / G L 2 (X) as 

CX) 

\\f\\ 2 H = 1/2 Y,(^P)- 2m 4 ■ (20) 

Therefore, the associated reproducing kernel k(x, y) writes as 

k m (x, y) = 2 V(2vrp)^ 2m Cp ( a ; - y) = \ ' B 2m ((x - y) - [x - y\ ) , 

U (2m)! 

where Bim, is the 2m-th Bernoulli polynomial. 

The set {e p (t),p > 1} is actually an orthonormal basis of H, where e p (t) = X p c p (t) 
for all p > 1. Let us consider Pi the uniform probability measure on [0,1]. We have 
e p — Epj[e p ] = e p and fj,\ = 0, where [i\ is the mean element associated with Pi. Hence, 
{(A p , e p (t)),p > 1} is an eigenbasis of Si the covariance operator associated with Pi, where 
for all € > 1 

(21) 
(22) 
(23) 

Note that the parameter m characterizes the RKHS TC m and its associated reproducing 
kernel k m (-, •), and therefore controls the rate of decay of the eigenvalues of the covariance 
operator Si. Indeed, by Lemma |2"01 we have <ii(Si,7 n ) = C\ j n and d2(Si,7 n ) = 

C*2 7n for some constants C\, C 2 > as n — > oo. 
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A = 


= 1 




e-i - 


= (AttIY 


-2m 


X 2 £ - 


= (4vr£)" 


-2m 



Directional alternatives Let us consider the limiting power of our test statistics in the 
following setting: 

H : Pi = P2 against H£ : P x + ¥$, with P£ such that dF^/aT 1 = 1 + An~ x l 2 c q , 

(24) 
where Pi is the uniform probability measure on [0, 1], and c q {t) is defined in (|17p . In the 
case 7„ = 7, given a significance level a G (0, 1), the associated critical level t\- a is defined 
as satisfying 



2- 1 / 2 d 2 - 1 (S 1 , 7 ) J>p(Ei) + 7 )- 1 A P (S 1 ){^ - 1} > <!_ 
P =i 

Note that a njP (7) = for all p > 1 (from Theorem [6]) except for p = q where 

a n , q (j) = VI y/ ni n 2 /n 2 (X q + 7)-^ A i/2 _ 



a . 




Figure 1: Evolution of power of KFDA as 7 = 1, 10 , ... ,10 9 , for q-th component alter- 
natives with (from left to right) with q = 1,5,9. 



In order to analyze the behaviour of the power for varying values of 7 and for different 
values of q, we compute the limiting power, when taking m = 2 in the periodic reproducing 
kernel, and for q = 1,5, 9, and investigate the evolution of the power as a function of the 
regularization parameter 7. As Figure [1] shows, our test statistics has trivial power, that is 
equal to a, when 7 ^$> X q , while it reaches stricly nontrivial power as long as 7 < \ q . This 
motivates the study of the decaying regularization scheme 7 n — ► of our test statistics, in 
order to incorporate the 7 — > into our large-sample framework. In the next paragraph, 
we shall demonstrate that the version of our test statistics with decaying regularization 
parameter 7„ — > reaches high power against a broader class of local alternatives, which we 
call non- directional alternatives, where q = q n — > 00, as opposed to directional alternatives 
where q was kept constant. Yet, for having nontrivial power with the test statistics T(j n ) 
against such sequences of local alternatives, the non-directional sequence of local alternatives 
have to converge to the null at a slower rate than ^/n. 
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Ap(S) 


di(S )7 ) 


rf 2 (S,7) 


Normal tails 


(exp(-cp 1 / d )) 


(log d (l/ 7 )) 


o(log d / 2 (l/ 7 )) 


Polynomial tails 


O (p~ ) for any /3 > a 


( 7 -W) 


o ( 7 - 1/2/y ) 



Table 1: examples of rate of convergence for the gaussian kernels for X 



Non-directional alternatives 

in the following setting: 



Now, we consider the limiting power of our test statistics 



Hn 



against H^ 



/ P£, with P£ such that d¥%/d¥ 1 = 1 + r] n c q 



(25) 



Assume Pi is the uniform probability measure on [0,1], and consider again the periodic 
spline kernel of order 2m. Take {q n }n>i a nonnegative nondecreasing sequence of inte- 
gers. Now, if the sequence of local alternatives is converging to the null at rate r] n = 



(2A) 1 / 2 ^n" 1 / 2 for some A > 0, with q n 
hold, then as long as 7n = X qn = q~ 2m we have 



o[n 



l/l+4m 



) for our asymptotic analysis to 



lim P H ™ (T n (j n ) > zi_ a ) = P H " (Z\ + P1P2 

= 1 - $ [zx-a - P1P2A] 



A > Z X -c 



o(l), 



where we used Lemma [20l together with Theorem[71 On the other hand, if 7 ~ 1 ^ 2m 
then the limiting power is trivial and equal to a. 

Back to the fixed-regularization test statistics T n ( 7 ), we may also compute the limiting 
power of T n ( 7 ) against the non-directional sequence of local alternatives defined in (f25j) 
by taking into account Remark [T7] to use Theorem [6l Indeed, as n tends to infinity, since 
a n,q n {l) = (Pi/ 3 2) 1//2 (Ag n + 7 )~ 1 ^ 2 Ag n r/ n , then the fixed-regularization version T n ( 7 ) of the 
test statistics has trivial power against non-directional alternatives. 

Remark 9. We analyzed the limiting power of our test statistics in the specific case where 
Pi is the uniform distribution on [0, 1] and the reproducing kernel belongs to the family of 
periodic spline kernels. Yet, our findings carry over more general settings as illustrated by 
Table\]\ Indeed, for general distributions with polynomial decay in the tail and (nonperiodic) 
gaussian kernels, the eigenvalues of the covariance operator still exhibit similar behaviour 
as in the example treated above. 

We now discuss the links between our procedure with the previously proposed Maximum 
Mean Discrepancy (MMD) test statistics. We also highlight interesting links with supervised 
kernel-based classification. 



5.2 Comparison with Maximum Mean Discrepancy 

Our t est statistics sh a re ma ny similarities with the Maximum Mean Discrepancy test statis- 
tics of lGretton et al.1 ( 20061 ) . In the case 7n = 7 , both have limiting null distribution which 
may be expressed as an infinite weighted mixture of chi-squared random variables. Yet, 
while f^ MD -^ CJ2™=i \(Zp - 1) where f^ MD denotes the test statistics used by MMD, 
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we have in our case T^ DA (j n ) — ► C^^ =1 (A P + j n )~ l X p (Zp — 1). Roughly speaking, the 
test statistics based on KFDA uniformly weights the components associated with the first 
eigenvalues of the covariance operator, and downweights the remaining ones, which allows 
to gain greater power for testing by focusing on the user-tunable number of components of 
the covariance operator. On the other hand, the test statistics based on MMD is naturally 
sensitive to differences lying on the first components, and gets progressively less sensitive 
to differences in higher components. Thus, our test statistics based on KFDA allows to 
give equal weights to differences lying in (almost) all components, the effective number of 
components on the which the test statistics focus on being tuned via the regularization 
parameter j n . These differences may be illustrated by considering the behavuour of MMD 
against sequences of local alternatives respectively with fixed-frequency and non-directional, 
for periodic kernels. 



Directional alternatives Let us consider the setting defined in (|24p . By a similar rea- 
soning, we may also compute the limiting power of T^ against directional sequences of 
local alternatives, with a periodic spline kernel of order m = 2, for different components 
q = 1,5,9. Both test statistics KFDA and MMD reach high power when the sequences 
of local alternatives lies on the first component. However, the power of MMD tumbles 
down for higher-order alternatives whereas the power of KFDA remains strictly nontrivial 
for high-order alternatives as long as 7 is sufficiently small. 




0123456789 
-logy 



1 23456789 
-logy 



1 23456789 
-logy 



Figure 2: Comparison of the evolution of power of KFDA versus the power of MMD as 
7=1, 10" 1 , . . . , 10~ 9 , for q-th component alternatives with (from left to right) 
with q = 1,5, 9. 



Non-directional alternatives Now, consider sequences of local alternatives as defined 
in (I25h . The test statistics MMD does not notice such alternatives. Therefore, MMD has 
trivial power equal to a against non-directional alternatives. 



5.3 Links with supervised classification 

When the sample sizes of each sample are equal, that is when n\ = ni, KFDA is known to be 
equivalent to Kernel Ridge Regression (KRR) , also referred to as smoothing spline regression 
in statistics. In this case, KRR performs a kernel-based least-square regression fit on the 
labels, where the samples are respectively labelled —1 and +1. The recentering parameter 
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di(Si,7„) in our procedure coincides with the so-called degrees of freedom in smoothing 
spline regression, w hich were ofte n advocated to provide a relevant measure of complexity for 
model selection (see lEfronl . 12004 ) . In particular, since the mean-shift in the limiting normal 



distribution against local alternatives is lower-bounded by ndj~ (Ei,7 n )((/X2 — /Ui),(£i + 
7n-0~ 1 (^2 — /^l))) this suggests an algorithm for selecting j n and the kernel. For a fixed 
degree of freedom <ii(£i,7 n ), maximizing the asymptotic mean-shift (which corresponds to 
the class separation) is likely to yield greater power. As future work, we plan to investigate, 
both theoretical ly and pract i cally, the use of (single and multiple) kernel learning procedures 
as developed by lBach et al.1 ( 2004 ) for maximizing the expected power of our test statistics 



in specific applications. 

6. Experiments 

In this section, we investigate the experimental performances of our test statistic KFDA, 
and compare it in terms of power against other nonparametric test statistics. 

6.1 Speaker verification 

We conducted experiments in a speaker verification task lBimbot et al.l ( 2004 ). on a subset 



of 8 female speaker s using data from the N IST 2004 Speaker Recognition Evaluation. We 



refer the reader to (JLouradour et al.l . 120071 ) for instance for details on the pre-processing of 
data. The figure shows averaged results over all couples of speakers. For each couple of 
speaker, at each run we took 3000 samples of each speaker and launched our KFDA-test 
to decide whether samples come from the same speaker or not, and computed the type II 
error by comparing the prediction to ground truth. We averaged the results for 100 runs 
for each couple, and all couples of speaker. The level was set to a = 0.05, and the critical 
values were computed by a bootstrap resampling procedure. Since the observations may 
be considered dependent within the sequences, and independent between the sequences, we 
used a fixed-block variant of the boostrap, which consists in using boostrap samples built by 
piecing together several boostrap samples drawn in each sequence. We performed the same 
experiments for the Maximum Mean Discrepancy and the Tajvidi-Hall test statistic (TH). 
We summed up the results by plotting the ROC-curve for all competing methods. Our 
method reaches good empirical power for a small value of the prescribed level (1 — j3 = 90% 
for a = 0.05%). Maximum Mean Discrepancy also yields good empirical performance on 
this task. 

7. Conclusion 

We proposed a well-calibrated kernel-based test statistic for testing the homogeneity of two 
samples, built on the kernel Fisher discriminant analysis algorithm, for which we proved 
that the asymptotic limit distribution under null hypothesis is standard normal distribution 
when de regularization parameter decays to zero at a slower rate than n -1 ' 2 . Besides, our 
test statistic can be readily computed from Gram matrices once a reproducing kernel is 
defined, and reaches nontrivial power aqgainst a large class of alternatives under mild con- 
ditions on the regularization parameter. Finally, our KFDA-test statistic yields competitive 
performance for speaker identification purposes. 
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ROC Curve 




Figure 3: Comparison of ROC curves in a speaker verification task 

8. Proof of some preliminary results 

We preface the proof by some useful results relating the KFDA statistics to kernel indepen- 
dent quantities. 

Proposition 10. Assume (Ajty-(Ai^). Let Pi and ¥2 be two probability distributions on 
(X,3E), and denote by fJ,i,fJ>2 the associated mean (see flTJj. LetQ be a probability dominating 
Pi and ¥2, and let E be the associated covariance operator (see (J2])J- Then, 



dPi d¥ 2 



< 00, 



m 



if and only if the vector (^2 — A*i) €E TL belongs to the range of the square root E 1 ' 2 . In 
addition, 



<^2-m,£ 1 (H2- Hi)] 



H 



dPi d¥ 2 



LH 



(26) 



Proof. Denote by {\k}k>i and {ek}k>i the strictly positive eigenvalues and the correspond- 
ing eigenvectors of the covariance operator E, respectively. For k > 1, set 



fk = \ 1/2 {e k -Qe k } . 



(27) 



By construction, for any k,l> 1, 
AaA/ = (ek,T,ee) n = (ejt 



L 2 ( 



A fc A^ {fk,fe} L 2 { 



where 5^/ is Kronecker's delta. Hence {/fc}fc>i is an orthonormal system of L? 
that /i2 — Mi belongs to the range of E ' 2 if and only if 

(a) (/i2 — Hi, g) -ft = for all g in the null space of E, 

(b) (/a - H2, £ -1 (Mi - ^ 2 )} w = E^Li A" 1 ( e P> (/■*! - ^2))^ < °°. 



Note 
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Consider first condition (jaj). For any g £ Ti., il follows from the definitions that 
<A*a " Ml, <7} W = / (dPi - dP 2 ) 5 = / (dPi " dPa) (5 - Qg) 



dPi dP 2 



-,9 



L 2 ( 



If 5 belongs to the null space of S, then ||g — Qo||l 2 (q) = 0, an d the previous relation implies 
that (jU2 — /J-i,g)ft = 0. Consider now (fbj). 



OO OO / ,. 

p=l p=l ^ 



x) - dF 2 (x)}e p (x) 



^/dF 1 _dP 1 \ 2 



P =l 



L'H 



dPi dP 2 



L 2 ( 



(28) 



In order to prove the equality, we simply notice that because of the density of the RKHS 

8(Q), 



in L 2 (Q), then {fk}k>i is a complete orthonormal basis of the space of functions L 2 
defined as 



Li 



clef 



g G L 2 (Q) , /(<? - Q#) 2 dQ > and 







(29) 
□ 



Lemma 11. Assume (j^)-(Jf^). LetPi and P2 two probability distributions on (X,X) such 
that ¥ 2 <Pi. 

Denote by £1 and £2 t/ie associated covariance operators. Then, for any 7 > 0, 



I-S~ 1/2 S^S 1 



-1/2 



HS 



<4 



dP 2 



|Tr{(£ 1 +7l)- 1 (£ 2 -£ 1 )}| <2d 2 (Si,7) 



L 2 (Pi) 

aT 2 
cflPi 



L2(Pi) 



(30) 
(31) 



where d 2 (Xi,7) is defined in 



Proof. Denote by {Afc}fc>i and {ek}k>i the strictly positive eigenvalues and the correspond- 
ing eigenvectors of the covariance operator Si. Note that (e^, Eie^) = \kO~k,e for all k and 

— 1/2 

I. Let us denote fj- = A fc {e^ — Pie^}. Then, we have (/&, fe) L 2rp \ = 5^/. Note that 

00 2 

2 ,, j^fc/ — A fc A^ (efe,S 2 e^) w | 

= E | X^' ( X " dpf J ^/ 2 + A fc 1/2 ^" 1/2 ^ 2 ~ A*l> e *>« ^ 2 - **1> e ^W [ ' 
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Then, using that (a + b) < 2(a + b ), and (|28p in Proposition [10] with E = Si, we obtain 



I _ E~ 1/2 E£E~ 1/2 



2 


dP 2 




<4 




- 1 


HS 


dP^ " 





2 
i 2 (Pl) 



Denote, for all p,q> 1 



ilef 



'P.9 



e p , (S 1 1/2 S 2 S 1 1/2 -I)e 



By applying the Holder inequality, and using (|30l) . we get 



(32) 



(33) 



|Tr{(Si + 7l) _1 (S2 - Si)}| = ^|{e P , (Ei + 7l) _1 £ie P > e p , p 

p=i 

1/2 / \ 1/2 

,2 



< [^<e p ,(E 1 +7l)- 1 S 1 e p ) 
,p=i 



5> 2 , p ] <2rf2(S!, 7 ) 
p=i 



<flP 2 
cflPi 



L 2 (Pi) 



which completes the proof of (|3ip . 



□ 



Proposition 12. Assume f^QJ). Lei {Xf , . . . , X™} 6e a triangular array of i.i.d random 
variables, whose mean element and covariance operator are respectively (/i n ,E n ). //, for all 
n all the eigenvalues A p (E") o/E n are non-negative, and if there exists C > such that for 

all n we have Y^ =1 A P /2 (X n ) < C, then *£™ =1 |A P (S - E n )| = P (n~ 1 ' 2 ). 

Proof. Lemma I2T1 shows that, for any orthonormal basis {e p } p >i in the RKHS TC: 



£|A P (£-E")|<]r||(£-E> p 
P =i p=i 



H 



We take the orthonormal family of eigenvectors {e p } p >i of the covariance operator £ n 
(associated to the eigenvalues A p (£ n ) ranked in decreasing order). Then, it suffices to show 

that J2pli (S - S n )e p = P {n~^ 2 ). Note that, 



n 



a 



£ - £ n e„ = n 



/ / Cp,i 



V 



^fcpQ,-) [n~ 1 J2e P ,n(X l 



i=l 



where e PtTl = e p — E n [e p (Xi)] and 

C P ,n,< = f HXi, -)ep,n(^) - E n {k(X u -)e P ,n(Xi)} 
By the Minkowski inequality, 



E'' 



E - E n e 



W 



1/2 



< ^E" 



+ ^E" 



■?? 



11 


n 

^ECp 2 

i=l 


n,i 


} 


1 , 


=i 


2 


n 



1/2 



1/2 



7 6p,nV^-J/ 



vll+^2 



i=l 
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We consider these two terms separately. Consider first A\. We have 

A\ = n~ 1 E n \\C P ,n,i\\H ^ n- x E n {\\k(X 1 , -)f H ^(X^ 2 } < n~ 1 \k\ 00 E n [\e p , n {X 1 

Consider now A2. Since \\ri~ 1 Yl7=i ^(^> '^\\n — l^l°°> we nave 

A\ < n-^looE" [|e Piri (Xi)| 2 ] . 
This shows, using the Minkowski inequality, that 

2 N 1/2 

00 



E n [ ^ (S-S n )ej 
, P =\ 



H 



P =l 



21 ll/2 



Since by assumption *£™ =1 {E n [|e p , n (Xi)| 2 ] } ^ = Y^Li V (S n ) < 00, the proof is con 



eluded. 



□ 



Corollary 13. Assume f42P- Let {x[] ] ni , . . . ,X^ ni } and {x[ 2) n2 , . . . ,X^ n2 } be two tri- 
angular arrays, whose mean elements and covariance operators are respectively (jti", £") and 
(/j,2, ^2), where n\/n — > pi andn^jn -^ pi asn tends to infinity. If sup n ^ X^i *V (^a) < 
00 , then 



^2 \ X p(^w - Z>w)\ = P {n 



-1/21 



In addition, we also have 



'W — ^w 



us 



P (n 



-1/21 



(34) 



(35) 



Proof. Since £u/ — ^w = n\n x (Xi — Si) + n-xri 1 (T'2 — SJj), then 



7 ^ K^V - ^iy)e p 
P =i 



00 00 

< nin -1 ^ (Si - Si)e p + n 2 n~ 1 ^ (S 2 - £2 )e f 
p=l p=l 



W 



and applying twice Proposition [12] leads to (|34p. Now, using that 



jVK - Z^iy 



< 



IIS 



£|A P (£ 



ty - ^w, 



P =i 



then (|35j) follows as a direct consequence of (|34j) . 



(36) 



□ 



9. Asymptotic approximation of the test statistics 

The following proposition shows that in the asymptotic study of our test statistics, we can 
replace most empirical quantities by population quantities. For ease of notation, we shall 
denote \i2 — \i\ by 6. fx% — fi\ by 6. 
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Proposition 14. Assume (C). If 

7n + rfJ 1 ( S l:7n)rfl(Sl,7n)7n 1 « -1/2 ""► ° 

d^" 1 (Si,7 n )n7/^ = 0(1) and dJ^Ei^rOdiC^TnHi 



i/ien, T n (7„) =f n (j n ) + o P (l), where 



fni-y) 



def 



niri2/n) (Si + 7I) 



-V2. 



w 



di(Si,7) 



(37) 



V2d2(Si,7) 

Proof. Notice that 

|d 2 (XV,7n) -rf2(Si,7n)| < |d 2 (Spi/,7n) - d 2 (T, w ,-/ n )\ + |d2(£w,7n) -d 2 (Si,7„)| . 

Then, on the one hand, using Eq. (|77p for r = 2 in Lemma [23] with S = T,w and A = 
Svi/ — Sjy and Eq. ([53]) in Corollary [TBI we get d 2 (IV,7n) — d 2 (Svy,7„) = Op{"f~ l n~ 1 / 2 ). 

On the other hand, using Eq. (|79p in Lemma [23] with S" = Si and A = n 2 n _1 (S2 — 
Si), we get d 2 (Svy,7n) — d 2 (Si,7„) = 0(r] n ). Furthermore, similar reasoning, using 
Eq. ([77]) and Eq. ([75]) again in Lemma [23] allows to prove that d^ (Si,7 n )di(Siy,7n) = 
d>2 (Si,7 n )di(Si,7„) + op(l). Next, we shall prove that 



(S w + 7n Lr 1 / 2 ( 5 = (Si + 7n I)- 1 / 2 5 +n- 1 P {(di(Si, 7n )+n7 ? 2)( 7 - 1 n 



■ir.-Va 



Using straightforward algebra, we may write 

2 



S^+7nI)~ 1/2 <5 - (Si+ 7n I)^ 1 / 2 5 



W 



< A X A 2 {Bi + B 2 } 



+ Vn)} 

(38) 



(39) 



with 



A, 



An 



def 



del' 



(Si + 7n rr 1 /2 ( 5 
(s w + 7n i)- 1 /2 ( j 



H 



B, 



Bo 



def 



def 



(XV + 7n I)" 1 / 2 (S H / - S w )(Si + 7 nl)~ 1/2 
(XV + 7nI)~ 1/2 (S^ - Si)(Si + 7nl)" 1/2 



We now prove that 



J 4 2 = Op(n- 1 di(Si,7 n )+7 ? 2 ), 
^ = P (n- 1 di(Si, 7n )+7 ? 2 ). 



(40) 

(41) 



We first consider (|30jl. Note that E U ® 5 J = c) n <g> 5 n + nT^Si + n 2 ~ 1 Y,%, which yields 
E||(Si + 7 nl)" 1/2 <5|| 2 = TVJ(Si + 7nI) _1 E (S ® *) } = (5 n , (S x + 7nl)~ 1 5 n > w 



» 



+ Tr{(Si + 7 J)~ 1 Si} + n^Tr{(Si + ln l)~ l (S™ - Si)} . (42) 

nin 2 
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Using Proposition [10] with £ = Si together with Assumption (C), we may write 



\(S n , (Si +7 n I) 1 $n) n \ < K^n,^ 1 ^) 



H 



< 



dP' 2 J 


- 1 


JPl 





L 2 (Fl) 



Next, applying Lemma [TT] we obtain 

|Tr{(£! + 7nl) _1 (^ - Ei)}| = 0(d 2 {Xinn)Vn) , 
which yields 

E||(Ei + 7nl)" 1/2 ^|| 2 = (n/niri 2 )di(S 1)7n ) {1 + 0(7? n )} + 0( V 2 n ) . 



(43) 



11R 



Opln" 1 / 2 ), 



Finally, we get (|40p by the Markov inequality. Now, to prove (|4ip , it suffices to observe that 
(£w + 7 n I)" 1 (Si + 7„I) = l+op(l), and then conclude from (|4"0]) . Next, using the upper- 
bound || (S + 7 n I)~ 1 / 2 || < 7„ . and Corollary[T3lwhich gives £vf — S^ 
we get 

Bi = Op(7~ 1 n- 1 / 2 ) . 

Finally, under Assumption (C), using Eq. (|3(jp in Lemma [TTl we obtain 

B 2 = P ( Vn ) . 
The proof of ([38]) is concluded by plugging ([4TJP4ll4l i4"5]) into ([39]) • 



(44) 

(45) 

D 

Remark 15. For the sake of generality, we proved the approximation result under the 
assumptions "f n + d^ (Si,7„)<ii(Si,7 n )7~ 1 n _1 ' 2 — ► on the one hand, d^ (Si,7 n )nry 2 = 
O(l) and d^ (£i,7n)^i(£i)7n)?7n — ► on the other hand. However, in the case ^ n = 7, the 
approximation is still valid ifnn^ — > 0, which allows to use this approximation to derive the 
limiting power of our test statistics against non- directional sequences of local alternatives 
as in OSD. 



10. Proof of Theorems MB 

For ease of notation, in the subsequent proofs, we shall often omit Si in quantities involving 
it. Hence, from now on, X p , X q ,d 2 stand for A p (Ei), X q , d 2 {Y,\,^). Define 



Y, 



clef 



n,p,i 



2 (e p (x\ l) )-ne P (X^)] 



n\n 



1/2 



1 < i < m , 

~ {^) l/Z {e p (xZ ] ni ) - ne p (x[ 2) )]) n 1 + l<i<n. 



(46) 



The following lemma gives formulas for the moments of Y np j, used throughout the actual 
proof of the main results. 

Lemma 16. Consider {l^ > p > j}i<i< n ,p>i and as defined respectively in |^6[ ) . Then 

J2 Wn, P ,iXn, q A = A p /2 Ay 2 {<5 p , g + n in ~ l e P)q ] (47) 

i=l 

Cov(y n 2 Pii , Y 2 >q>i ) < Cn~ 2 \ fcUA p / 2 Ay 2 (l + e p , p ) l ' 2 {l + e^) 1 ' 2 . (48) 



21 



Proof. The first expressions are proved by elementary calculations from 

EpW y n,«,m+i] = 0, since X[ 1] J_ X[ 2) 

E[Y n , p , ni+1 Y njq , ni+1 ] = ^T X T { S P,* + ( e Pi (Sr 1/2 ^S~ 1/2 - l)e q )} . 

Next, notice that, for all p > 1, we have by the reproducing property and the the Cauchy- 
Schwarz inequality 

|ep(x)| = {e p ,k(x,-)) n < \\e p \\ n \\k(x, -)\\ n < |/c|^ 2 . 

which yields 

|Cov(Y n 2 pil ,y^)| < Wl P , Y U +Wn, P ,MYn, q ,i] 

<ce^K p ^ 2 [yU 

<C7n- 1 |fc| 00 E 1 /2[y n 2 p j E i/2 [ y n 2j 

< Cn- 2 |fc|ooAy 2 Ay 2 (l + e p , p ) 1 /2(i + £qq )^ . n 

10.1 Proof of Theorem [6] 

Proof. The proof is adapted from ( Serflingul980l . pages 195-199). By Proposition 1141 



ffi f ^ Vn,oo(r/) -di(Ewy) ,. 



where 



with 



Now put 



oo 



^1,00(7) = ^2 (Ap + 7) X ( Sn,p + \/-^ ( S n, e f 
p=l 



S n ,p = \ ~^— ~ (<* - S n , e P ) = ^2 Yn,p,i • (49) 



i=l 



Because {Y ntPt i} are zero mean, independent, Lemma [T6l-Eq. (|47p shows that, as n goes 
to infinity, ^™ =1 Gov{Y ntPt i,Y n)q ^) — > X p X q S Pyl} . In addition, the Lyapunov condition 
is satisfied, since using (|4"8"]l . X)^=i^lXnpi] — Cn~ 1 X p . We may thus apply the central 

limit theorem for multivariate triangular arrays, which yields S n) 7\r — ► Af(0,A]\[) where 
&n,N = (S n ,i, ■ ■ ■ , Su,n) and (Aat) Pj(? = 5 Ptq \ p , 1 < p,q < N. Fix u a nd let e > be given . 
Then, using the version of the continuous mapping theorem stated in ( van der Vaarti . Il99a . 
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Theorem 18.11), with the sequence of quadratic functions {g n }n>i defined as [ g n : TV 

(Ti,...,T/v) i-> (T N + a. n ) T [diag(ai,...,a N )](T N + a n ) ], we may write 



| E [ e ™V„, JV ( 7 )] _ E [ e W„,jv(7)]| < e > 



(51) 



with V n ^{l) = Sp=i(-^p + l)~ l \>(%p + a n,p) 2 , where {Z p } p >i are independent standard 
normal random variables, defined on a common probability space, and {a njP } p >i are defined 
in (fl3|) . Next, we prove tlia^Jimw_ ±2 olimsup n ^ 00 E[(V r n)0O (7) — V^,at(7)) 2 ] = 0. By the 
Rosenthal inequality (see (|Petrovl . Il995l . theorem 2.12), there exists a constant C such that 
^[S„ J < C(n~ 1 X p + A 2 ). The Minkowski inequality then leads to 



E 1/2 [(K,oo(7)-K, J v(7)) 2 ] 

< f; (a p + 7)- 1 evJ^, p+ /nin2 



p=7V+l 



\On; 6 p/ 



^G^- 1 E A^(n-^ + A V 2 )+n £ (A p + 7r 1 («5n,e f 

p=7V+l p=7V+l 



p=iV+l p=7V+l 



+ «(!)• 



Notice that, using (128ft in Proposition [10] with S = Si, we have 



n E (^p + 7) * (<5n,e p ) 2 < 777 1 Aat+i ^ A p 1 (<5 n , e p ) 2 < 7 1 Aiv+i ™?? 2 , (52) 

p=N+l p=l 

which goes to zero uniformly in n as N — > 00. Therefore, under Assumptions (E[T]) and (C), 
we may choose N large enough so that 



|E[ e i«Vn,oc(7)] _ E^n^Wji < e . 



(53) 



Similar calculations allow to prove that K[(V nj00 ( , -f) — T4.,A r (7)) 2 ] = °(1)> which yields that 
for all e > 0, for a sufficiently large N, we have 



| E [ e mVn,oo(7)] _ E [ e 



i«K l ,iv(7)] 



< e . 



(54) 



Finally, combining (|51j) and (|53|) (|Mj) . by the triangular inequality, we have proved that, 
for e > 0, we may choose a sufficiently large N, such that 



| E [ e mVn,oo(7)] _ E [ e 



i«Vn,oo(7)l 



< e 



(55) 



and the proof is concluded by invoking Levy's continuity theorem ( Billingslevl . Il995l . Theo- 
rem 26.3). □ 
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Remark 17. For the sake of generality, we proved the result under the assumption that 
nr]n = O(l). However, if there exists a nonnegative nondecreasing sequence of integers 
{q n }n>i such that for all n we have Ep*Li(\ + 7) _1 (<5 n ,e p ) 2 = (X qn +7)" 1 (^n,e qn ) 2 , then 
the truncation argument used in h52\) is valid under a weaker assumption. In particular, 
when considering non- directional sequences of local alternatives as in 125)) . it suffices to take 
N — > 00 such that N~ 1 q n = o(l), which for n sufficiently large allows to getn^ c *' =N+l (X p + 

j)~ l {^ni e p) = in place of 1152)) in the proof. The rest of the proof follows similarly. 

The following lemma highlights the main difference between the asymptotics respectively 
when 7„ = 7 and 7 n — > 0, which is that di(£i,7 n ) — ► 00 and d2(Si,7 n ) — ► 00 in the case 
7 n — > 0, whereas they acted as irrelevant constants in the case j n = 7. 

Lemma 18. If j n = o(l), then, di(Si,7 n ) — > 00, and ^2(^1,7™) — > °°> as n tends to 
infinity. 

Proof. Since the function x 1— > xj(x + 7„) is monotone increasing, for any A > j n , A/(A + 
Jn) > 1/2- Therefore, 

E Ar yfl\ >|#{fe<n:A p (E 1 )> 7 n} ; 

^A p (Li) + 7 n 2 

and the proof is concluded by noting that since 7 n — ► 0, #{k : A p (£i) > 7„} — ► 00, as n 
tends to infinity. □ 

The quantities A p (Ei), A ? (Ei),di(£i,7 n ),d2(Si,7 n ) being pervasive in the subsequent 
proofs, they shall be respectively be abbreviated as X p , X q , di jn , da, n - Our test statistics 
writes as T n = (y/2d2, n )~ 1 A n with 



. def n\ri2 
n 



(S 1 + 7n I)- 1 / 2 ,5 -d 1>n . (56) 



Using the quantities S n>p and Y n ,p,i defined respectively in (f4T)j) and (f4"S]) . ^4 n may be ex- 
pressed as 



^n = ^(A p + 7n) 1 {Sn,p + J {5 n ,e p )\ - di, 



P =i 

p=l k V J 

00 

+ ^ («5 n , (Si + 7 nl)" 1 5 n > + 2 (Ap + Jn)" 1 {ES 2 , p - X p } . 

n p=l 

Since, by Lemma [TBI Eg. (j47|) . ES^ p — A p = {n\/n)X p e p ^ p , where e PjP is defined in (|33j) . then, 
by Holder inequality, we obtain 



^(A p + 7 n)- 1 {^n,p-Ap} 
p=i 



1/2 / \ 1/2 



3C 



< ( E ( A P + ^^ X l) E £ L I = °( d 2,nr?n) 

,p=l / \p=l 
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We now decompose 



J2 ( a p + ^r 1 { s Ip - KS l P + 2 J^- Sn >p ^ ^ } = Bn + 2Cn + 2jDn ' 



where I? n and C n and -D n are denned as follows 



B n 


oo n 

p=i «=i 


-Ey 2 

n,p : 


«} 


^n 


oo 

= £ ( A P + 7n)" 
p=l 


n 
i=l 


p,i 




oo 


n 





(57) 



(d n ,ep) , (58) 



71 
i-1 



D n = J2 ( X P + In)' 1 £ Y n,P,i \ £ Y n,p,j \ • (59) 

p=l i=l [i=l J 

The proof is in three steps. We will first show that B n is negligible, then that C n is negligible, 
and finally establish a central limit theorem for D n . 

Step 1: B n = op(l). The proof amounts to compute the variance of this term. Since the 
variables Y n ,p,i and Yn,qj are independent if i ^ j, then Var(i? n ) = Y17=i v n,ii where 



v n ,i d ^ Var ^(A p + TrO^K^ " E l Y n, P ,i\} 
\p=i 

oo 

= £ ( X p + 7n)" 1 (A, + 7n)- 1 Cov(F n 2 ^, r r 

P,?=l 

Using Lemma [16j Eq. (|48p . we get 



2 / \ 2 



E^^n" 1 ^(Ap + Tnr^Cl + epj,) 1 / 2 < Cn"^ 2 ^ A*/ 2 {! + %)) 
i=i \p=i / \p=i / 

where the RHS above is indeed negligible, since by assumption we have 7~ 1 ?i~ 1 ' 2 — > and 
E p °°=i ^ < oo. D 

Step 2: C n = op(d\ n ). Again, the proof essentially consists in computing the variance of 
this term, and then conclude by the Markov inequality. As previously, since the variables 
Y n>p> i and Y n> gj are independent if i ^ j, then Var(C n ) = J2i=i u n,i-, where 



it 



oo 

/=£(A P + 7n)- 2 E[F n 2 p ,]^(5 n ,e p ) 2 
p=i 



+ £ (\ + In) 1 (A g + 7„) 1 E[F nj p j iy n) g ) i] (<5 n , e p ) (<5 n , e q 

p,g=l 
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Moreover, note that E[Y^ j] < Cn 1 X P , and under Assumption (QT]) 

oo 
^-^ d 2 2 n J2^ X P + ^n)~ 2 \ (K, e p ) 2 

n 

P =i 

= V <i 2,«L( A P + ^ ^™' e p) - d 2,n 3 =0(1). 

p=l 2 ' n 

Similarly, for p ^ q we have |E[l^ )P) jl^ jgi j]| < Cn~ 1 X p X q \e Ptq \, which implies that 
n\n 2 2 



rf 2 ,n^(^ + 7n.)~ 1 (A g + 7 n ) 1 Ay 2 Ay 2 | (5 n ,,e p ) || (<5 n ,e g ) ||e p ,,| 

< ^^ (E(A P + 7 ra )- 2 A P (^,e p ) 2 J (X>P,J =°(1) • 

D 
5tep 5: d,2~ n D n — > jV(0, l/2). We use the cent ral limit theorem (CLT) for triangular array 



of martingale difference ( Hall and Heydd . Il980l . Theorem 3.2). For = 1, . . . , n, denote 



£n,i = dz^^iXp + y n ) l Y n>p> iM n)P) i-x, where M n>p>i = }^ Y n>p>j , (60) 

p=i j=i 

and let T n ,i = cr (Yn,p,jiP G {lj • • • > n } 5 J G {0, ■ ■ ■ , £})■ Note that, by construction, ^ ni is a 
martingale increment, that is E [£ ni j | .F^i-i] = 0. The first step in the proof of the CLT is 
to establish that 

n 

s^^E^IVil^l/2. (61) 

t=i 



The second step of the proof is to establish the negligibility condition. We invoke (JHall and Hevdej . 



1 p 

19801 . Theorem 3.2), which requires to establish that maxi<j< n |£ n ,i| — > (smallness) and 

E(maxi<j< n ^ J is bounded in n (tightness), where £ n> j is defined in ([60]) . We will establish 

the two conditions simultaneously by checking that 



E max e n ,i = o(l) • (62) 

Splitting the sum s^, between diagonal terms E n , and off-diagonal terms F n , we have 

oo n 

En = d& Y,(X P + 7n)" 2 J2 Kp,i-l E Kp,i] > ( 63 ) 

p=l i=l 

n 

F n = d 2 2 n ^( A P + 7n)" 1 (A 9 + Jn)~ l J^ ^i-l^^.j-lE^y^,,;] . (64) 

p^q i=l 
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Consider first the diagonal terms E n . We first compute its mean. Note that E[A^ 2 •] 
Y^j=i^\Xn, P ,j\- Using Lemma [HI we get 



n i— 1 



E( a p + ^)" 2 E E E [ y n 2 PJ ]E[y n 2 p . 



-. oo 

^E( a p+^)~ 2 



P =i 



E E t y 4* 



i=i 



n .. 

J]E 2 [y„ 2 p ,] I = -4„ {l + 0(d 2 ~V) + ^(n- 1 )} 



Therefore, E[£ n ] = 1/2 + o(l). Next, we check that E n - E[E n ] = o P (l) is negligible. We 
write E n - E[E n ] = d 2 2 n ^" j(A p + ^ n )~ 2 Qn, P , with 



Q n , p d ^ Y, E l Y n,P4+i} i N l P ,i ~ E i N n,P,i\ } 



i=l 



Using this notation, 



(65) 



Vax[E n ] = d~ 2 % £(A P + 7n)- 4 E[Q 2 >p ] 



P =i 

+ 2d 2 A n E ( A P + 7n)" 2 (A g + 7n)" 2 E[Q n , p Q n , g ] 
l<p<(jr<n 

We will establish that 

\E[Q n , p Q n!q ]\ < c{\ 2 p \ 2 q (5 Pyq + \e p , q \) 2 +rC 1 )# 2 \V 2 } . 



(66) 



(67) 



Plugging this bound into (f66|) and using that X p /(X P + r y n ) < 1 and d2, n - > °o as n tends to 
infinity, yields under Assumption (E[T]) 

Var[£ n ] < [d^l + n-V" 1 ^} + C I d^ 2 nVn + n~ x d~ 2 % I f] A p 

showing that VarfE^] = o(l), and hence that E n — E[.E n ] = op(l). To show (|67p . note first 
that {M 2 • — E[M 2 -]}i<j< n is a J^-adapted martingale. Denote by u n ,p,i its increment 
defined recursively as follows: v n ,p,i = ^n, P ,i ~ E[-/V 2 J and for i > 1 as 

1/n.p.i = M 2 >p>i - E[M 2 iP>i ] - {iV 2 ^ - E\N 2 n ^_ x \\ = Y 2 >p>i - E[Y 2 tPti ) + 2Y n>p>i M TljP ^ 1 . 

Using the summation by part formula, Q UtP may be expressed as 



ra-1 



^n,p — / _, ^n,p,i 



i=l 



E E P£w-] 
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Using Lemma [16j Eq. (|4T|). we obtain for any 1 < p < q < n, 



\nQn, P Qn, q }\ < [ E E ^ 2 pJ E e k 

n-1 



2 'I 



71-1 



2jE[z/ njPji f nig] j] 



i=\ 



< CX p X q (l + 0(7?„)) 



/ ^\yn,p,i^n,q,i\ 



i=X 



(68) 



We get 



EK, P ,i%J = Cov(r n 2 .,y 2 .) + 4E{y niPii y nigii }E{M niPii _iiv riigii _ 1 } . 



.,p,i> n.q 

First, applying Eq. ([38]) in Lemma [TBI gives 

n-l 

^Cov(y re 2 P)i ,F4J < Cn- l \y 2 \\l 2 



(69) 



j=i 



Since E[M n)P) j_iiV n)g) i_i] = £*- =1 Efy^y^j], LemmadH Eq. (@7D shows that 

\ 2 



5^E[y r ni p >i y nig>i ]E[M n) p )i _iJV Ilig) 



i-l 



8=1 



/ .< [ w,p,i -Mi,g,i] I /_v L ra iPi* n i9i*J 



J=l 



i=l 



< CA p Ag((5p ig + |e 



(70) 



Eq. [67] follows by plugging ([69]) and ([70]) into ([68]) . We finally consider F n defined in ([M]) . 
We will establish that F n = op(l). Using Lemma [THl-Eq. ([47]) . 

E^tA^^lE^K^] < CA p / 2 Af , 
and |E[y njPj jy njgi j]| < Cn~ 1 \ p \ q e Ptq , the Minskovski inequality implies that 
{Elig 2 } 1 / 2 < Cd^l ^2(X P + 7n)~ 1 (A g + 7n) _1 Ap Vp,« < C Vn , 

showing that F n = o(l). This concludes the proof of Eq. (|61|) . 

We finally show Eq. (|62j) . Since |ln,p,i| < "-~ 1 ' 2 |^|oo IP-a.s we may bound 



max |f ni | <Cd 2 l n n 1/2 ^(A p + 7„) x max |M npi _ x | 

Ki<n * — ' Ki<n 



(71) 



p=i 



Then, the Doob inequality implies that E 1 / 2 [maxi< i < n \M n ^i_i\ 2 } < E 1 / 2 [iV 2 pin _ 1 ] < 

1 /2 

CAp . Plugging this bound in ([71]) , the Minkowski inequality 



E 1/2 (mfxe 4 ) < C j c^7n V 1 / 2 f>V 2 



and the proof is concluded using the fact that 7„ + d 2 (Si, 7„)di(Ei, 7n)7 n ln X ~~ y an d 
Assumption (HIJ. D 
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11. Proof of Theorem [5] 

Proof of Proposition^ We denote by E = piEi + P2^2 + P1P28 <8> 5 the covariance oper- 
ator associated with the probability density p = p\p\ + P2V2-, and 5 = P2 — P\- Then, 
Proposition [10] applied to the probability densities p\,P2 and p = p\p\ + P2P2 shows that 
<^- 1 ^ = /^^^Thus 



PlP 2{5^ l 5) n - 



[^{Pl-P? + ^{P2-P)' 



p 



dp 



1 



PlPl+p2P2 d _\P]l_\P}_ 
2/31/02 J P 2pi 2p 2 



p 

1 1 p 2 lpi 



2p\P2 2 pi 2p 2 



P1P2 
p 



dp = 1 



P1P2 



dp . 



The previous inequality shows that pip2 (5, E 1 ^) n < 1 is satisfied when f piP2/pdp 7^ 0. 
Therefore, in this situation, 

(S, ( Pl Ei + p 2 E 2 )- 1 5) H = (S, (E - p lP2 5 ® 5y 1 5) H 

= (5,Z- 1 6) n (l- p 1 p 2 {6,X- 1 5) n )- 1 , 

and the proof follows by combining the two latter equations. 

Consider now the case where J PiP2/pdp = 0, that is when the probability distribution 
Pi and P2 are singular (for any set A £ X such as Pi (A) 7^ 0, P2(^4) = and vice- versa). 
In that case, (<5, (piEi + P2^2)~ l b) H is infinite. □ 

Proof. We first prove that 



(tw + jniyVH 



H 



(E w + 7oo/)" 1/2 5 



H 



del 



where 700 = lim n ^ 00 7 n . Using straightforward algebra, we may write 



{t w + ln l)~ l l 2 5 - (E w + 7oc/)" 1/2 5 



n 



H 



< d + c 2 + c 3 



(72) 



(73) 



where 
def 



Ci 



:E W +7nI)~ 1/2 <5 (S W + 7n ir 1 / 2 <5 (E l y+7nI)- 1/2 (E W -Eiy)(Siy + 7nI)" 1/2 



11s 



a 



a 



del 



def 



(E W + 7nI)" 1/2 <5 2 - (Eh/ + 7nI)" 1/2 <5 2 ) 

(Eiy + 7nI)" 1/2 5 2 - (Eiy + 7oo /)- 1 / 2 «5 2 ) 



First, prove that C\ = op(l). Write Ci = A1A2B1. Using (with obvious changes) the 
relation (|42p . the monotone convergence theorem yields 



lim E 

n— >oo 



(Eiy + 7nI)" 1/2 <5 =<5,(Eiy+7oc/)" 1 ^ 



w • 



2!) 



which gives A\ = Op(l). As for proving A 2 = Op(l), using an argument similar to the 
one used to derive Eq. (jHJ), it suffices to observe that A 2 = A\ + op(l). Then, Eq. ([35]) 
in Corollary [P31 gives B\ = Op(7~ 1 n -1 / 2 ), which shows that C\ = A\A 2 B\ = op{\). Next, 
prove that C 2 = op(l). We may write 



C 2 = 2(5-5,(J:w+lniy 



n 



{H w + ln l)- l l 2 {8-8) 



H 



Since ||(£iy + 7 n I) 1//2 || w < In , and ||(XV + 7 n I) 1 ^ 2 ^||^ < °°j an d moreover \\S — 
S\\ n = Opin" 1 / 2 ), then we get C 2 = Op^^n" 1 / 2 ) = o P (l) Finally, prove that C 3 = o(l) 
Note that C3 = — X^p*Li 7n 1 (A P + 7n) _1 A p (5, e p ) w , where {A p } and {e p } denote respectively 
the eigenvalues and eigenvectors of Eyj/. Since [7 *—>■ (A p +7) _1 7] is monotone, the monotone 
convergence theorem shows that C3 = o(l). 

Now, when F 1 / P 2 , Proposition H with P = piPi + p 2 F 2 ensures that 5 £ 7£(£^ 2 ) as 

long as ^p 2 - — 1 < 00. Then, under assumption (.A|2j), by injectivity of T>w we have 

£ (Pi) 
0" 7^ 0. Hence, since S^ is trace-class, we may apply Lemma [T9l with a = 1, which yields 

d (Eiy,7„) n — > 00. Therefore, T n (7 n ) — ► 00, and the proof ois concluded. Otherwise, 

that is when -^ — 1 = 00, we have T n (ln) — ► 00. D 

ari L 2 (Pi) 



Appendix A. Technical Lemmas 

Lemma 19. Let {A p } p >i be a non-increasing sequence of non-negative numbers. Let a > 0. 
Assume that Y^p>i A" < 00. Then, for any > a, 



sup7 a ^A^(A p + 7)" /3 <2^A: 



7>0 



P =l 



P =l 



Ln addition, if linip^oo pA" = 00, i/ten /or any (3 > 0, 



(74) 



im 7 Q ^A^(A p + 7 )^ = oo. 



lim 



P =i 



Proof. For 7 > 0, denote by g 7 = sup p>1 {p : A p > 7}. Then, 
00 00 

£ A£(A P + 7 )H> < 7 a E A p ( A P + 7)"" ^ ^7 + E A 



00 00 00 

p=l p=l p>q~/ 



dcf 



(75) 



(76) 



Since the sequence {A p } is non- increasing, the condition C = X^ P >i A p < 00 < 00 implies 
that pA" < C. Therefore, A p < C 1 ' a p~ 1 ' a , which implies that for any p satisfying C^~ a < 
Pi A p < 7, showing that g 7 < C^f~ a . This establishes (|71j) . 

Since A 1— > A(A + 7) _1 is non-decreasing, for p < g 7 , A p (A p + 7) -1 > (1/2). Therefore, 
7° SiS=i ^p(^p + t) - ^ > (2)~^7°g7- Since linip^oopAp = 00, this means that A p > for 
any p, which implies that lim 7 _ >0 + #7 = 00 • Therefore, lim 7 ^ + Qy^g = lhu 7 ->o+ g 7 7 a = 00. 
The proof follows. □ 
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Lemma 20. Let {Ay,} p >i be a non-increasing sequence of non-negative numbers. Assuse 
there exists s > such that \ p = p~ s for all p > 1. Then, 



]T(A p + 7 nrA; 
Proof. First note that 



l/r 



l/r 

,-i/VW (i + t ,-)-r du ^ (l + o(l)), as 7 ^0. 



E( A P + 7n)" r A^ = £(1 + TnA^r = E(! + (Tn^) 5 )^ ■ 
p=l p=l p=l 

For all 7 > 0, the function [u t— ► (1 + (7 ' s u) s )~ r ] is increasing and nonnegative. Therefore, 
for all p > 1 we may write 

(1 + (j 1/s u) s y r du < (1 + (7 1/s p) s )~ r < / P (1 + (7 1/s u) s )- r du , 
p Jp—i 



-l/s 



7 1/s (P+l) 



/ (1 + v s )- r (iw < (1 + (7 1/s p) s )~ r < 7" 1/s / (1 + v s n r dt; . 



7 1 / s p 



Hence, sussing on p over 1, . . . , TV — 1, we obtain 



1 



-1/s 



n 1/s N n 1/a N 

/ (l + v s r r dv <Ep=i(l + (l l/s P)T r <7" 1/s / (1 + V) 

J7I/S y Jo 



r dv. 



Therefore, taking N — > 00 in such a way that 7 1 ' s A r — > 00 as 7 — > 0, we finally get 



£(1 + (7 1/s p) s )" r = 7" 1/s { J o (1 + ^)" r ^| (1 + o(l)) • 



□ 



Lemma 21. £e£ A be a self-adjoint compact operator on 7i. Then, for any orthonormal 
basis {p p }p>i ofH, 



p=l p=l 



w 



Proof. Let {V'p}p>i be an orthonormal basis of Ti consisting of a sequence of eigenvectors 
of A corresponding to the eigenvalues {A p (^4)} of this latter operator, so that (ip p , Aip p ) n = 
X P (A). Then, 

00 00 00 00 

Eia p (a)| = ^2\{^ p ,Atfj p ) n \ < ^2^2\(Aip q ,^ p ) H \ \(<p q ,ii> P ) H \ 

p=l p=l q=l p=l 

1/2 / \ 1/2 



00/00 



g=l \p=l 



P/Hl 



, P =i 



H\ 



< 



En^iiw • 



D 
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Appendix B. Perturbation results on covariance operators 

Lemma 22. Let A be a compact self-adjoint operator, with {A p } p >i the eigenvalues of A, 
and {e p } p >i an orthonormal system of eigenvectors of A. Then, for all integer k > 1, using 
the convention pk+i = P\ , 

oo oo oo oo [ / k \ / k 

p=i Pl =ip 2 =i Pk =i [ \j=i J \i=i 

Proof. Let k be some integer, fixed throughout the proof. The proof is by induction, that 
is, we shall prove that, for all £ € {1, ... , k}, 



J2(e P ,(AB) k e p 



P =i 



oo oo oo 



= £ £ • • • £ \ ( ff ^ ■ 1 ( ff ^ , Be Pj+1 > I (e pt , {ABf-^e Pl )}, V{£) 
Pi=ip2=i pi=i [ y=i / \i=i / 

First, for £ = 2, using that A*e Pl = Ae Pl = X Pl e Pl , and B*e Pl = £^°° (e Pl ,Be P2 ) e P2 , we 
indeed have 



£ L^ABiABf-^) = £ V (B*e Pl ,(AB) k - 1 e Pl 



pi=i 



pi=i 



7 , Api ( 2^ \ e Pi ' Be P2 } e P2 , (AB) e Pl 



Pl = l \ P2 

oo oo 



Assume the statement 'P(^) is true, with £ < A; — 1. Let us now marginalize out, first A 
then B in (AB) +1 , for the (£ + l)-th time, by summing over an index pe+\. Using the 
same arguments as above, that is A*e Pe = A Pj ,e P) , and B*e Pe = ^°° (e Pe ,Be Pe+1 ) e Pe+1 , 



£(e p ,(AB) fc e p 
P =i 

oo oo ( /e-i \ A-i 

E ■■■ £ IK Il<^> B W> ] (en,AB(AB)^e 
pi=l p e =l I \i=l / \j=l 



Vi 



oo oo 

E-E 

Pl=l Pf=l 



£-1 



[1^ V ] J(e Pj ,i?e Pj+1 > (i?* ep „(Ai?) fe -^ Pl 



vi=i 



\i= l 



oo oo oo 



E-E E \[\{\A\X{(^^ B ^ + ,)\{^Be Pl+1 )(e Pc+1 ,(AB) k -% 

pi=l P£ = l P/ , +1 = l I w=i / \j = l 



VX/ ( ■> 
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which proves V{£ + 1). 

The proof is concluded by a fe-step induction, that is once A in (AB) k is eventually 
marginalized out fc-times and only the last term (e Pk ,Be pi ) remains. □ 

Lemma 23. Let 7 > 0, and S a trace-class operator. Denote {A p } p >i and {e p } p >i respec- 
tively the positive eigenvalues and the corresponding eigenvectors of S. Consider d r (T,j) 
for r = 1, 2, with T a compact operator, as defined in ([6]). If A is a trace-class perturbation 
operator such that \\(S + 7i)~ 1 A|| < 1, and ||A|| C = Ylp*Li l|Ae p || < 7> then 

\d r (S + A, 7) - d r (5, 7 )l < 1 7 J t ah » f° r r = l,2. (77) 

Ifd 2 (S,~f) WS-^AS- 1 ^^ < 1, then 

|d!(5 + A,7)-d!(5,7)| < , ^f iJ A Jy ■ (78) 

l-d 2 {S,'y)\\S L ' 2 AS i/2 || HS 

|| 5 ~i/2 A)S -i/2|| 
|d2(5 + A, 7 )-d 2 (5,7)| < ilns-i^s-i/S ■ ^ 

II II HS 

Proof. If II ((5 1 + 7i)~ 1 A}|| < 1, then we may write 

(S + A + 7l)" 1 (S + A) = (I + (S + 7 I)- 1 A)- 1 (5 + jI)-\S + A) 

00 
= £(-l) fc {(S + 71)"^}* (S + jI)-\S + A) 

fc=0 



(5 + 7!)-^ + £(-l) fe {(5 + 7 I- 1 )A}" ((5 + 7 I)" 1 5 - I) , 



fe=i 

where the series converge in operator-norm. Since the trace is continuous in the space of 
trace-class operators, and using \\(S + 7l)~ 1 S' — I < 1, we get by linearity of the trace, 

|di(5 + A, 7) - ^(5,7)1 = |Tr{(5 + A + 7 I)" 1 (5 + A)} - Tr{(S + 7 I)" 1 5}| 
00 00 

= ^|Tr{{(5 + 7l)- 1 A} fc {(S + 7 ir 1 5-l}}|<^|Tr{((5 + 7l)- 1 A) fc }| . (80) 
fe=i fe=i 

Applying Lemma [22] with B = A, and A = (S + 7l) _1 , we obtain 

00 
Tr{((5 + 7l)- 1 A) fe } = ^(e p ,((5 + 7l)- 1 A) fc e p ) 

p=i 

00 00 [ / fe \ / A: 

= £•••£ rK^+7)- 1 nfe^ +1 : 

pi=i Pfe=i ^ y=i / \i=i 

Since, for all 1 <j< A;, we have |(e Pj . , Ae Pj+1 )| < ||Ae p .|| and (A p . +7)" 1 < 7" 1 , the upper- 
bound in (|80|) is actually the sum of a geometric series whose ratio is 7" 1 Yl^Li l|Ae p || = 
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7" 1 ||A|| C , where 7~ x ||A|| C < 1 by assumption, which completes the proof of (|77p when 
r = 1. A similar reasoning as above allows to prove (|77p when r = 2. 
We now prove the second upper-bound (|78|) . Using that 

Tv{((S' + 7 I)- 1 A) fe }| = TrU^ 2 (S + 7 I)' l S 1 / 2 ) (V^AS 1 - 1 / 2 )}* 

we may apply Lemma [22] again, but with B = S'^AS' 1 / 2 , and A = S 1 ^ 2 (S + -fVj^S 1 / 2 , 
yielding 

oo 

Tr{((S + 1 l)- l A) k } = ^2(e p ,((S + 1 I)- 1 A) k e l 

P =i 

oo oo I / k 



pi=i Pk =1 l \i=i / \i=i 



e^AS-^AS- 1 ' 2 ] 



Pj+i 



Then, using that |(e p ., (S~^ 2 AS^^ 2 ) e Pj+1 )\ < \\ (S^^AS^ 1 / 2 ) e Pj \\, and applying Holder 
inequality, we obtain 



|Tr{((5 + 7 I)- 1 A) fc }| 



fc/2 



^E( a p+^)" 2a 

p=l 



oo oo I k 

e---e nM 5 ~ 1/2A5 ~ 1/2 

pi=i Pfe=i V^ 1 



1/2 



-Pj+l 



< d fc (5) ||5- 1/2 A,S- 1/2 



us 



Finally, going back to (|80p , the upper-bound is actually the sum of a geometric series whose 
ratio is d(S) \\S" 1/2 AS- 1/2 \\ KS , where d(S) \\S~ 1/2 AS~ 1/2 \\ US < 1 by assumption, which 
completes the proof of ([78]) . As for ([79]) , observe that 



oo 

\d 2 (s+A n )-d 2 (s, 1 )\<Y,\\{{(s+iir 1 A} k {(s+^r i s-i}} 

k=l 

oo 

<E||{( 5 +^)" lA } fc 
*;=i 

oo 

<E||{ S ~ 1/2A5 ~ 1/2 } 



IIS 



fc=l 



US 

A' 
US 



where we used the inequality ||AB|| HS < ||-A|| H g ||-6|Ihs> an< ^ || (5" + ■7-1) 1 S I — l|| < 1 and 
||(5' + 7l)- 1 >S , || <1. □ 



Appendix C. Miscellaneous proofs 

Proposition 24. Assume (AHty and (BJty. Assume in addition that Pi 
7„ = 7 , then 



Sup x 



¥{Toc{^w,l) < x) -P(T 00 (S W ,7) < x) 



0, 



D - // 



(81) 
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where T 00 (S, 7) for a trace-class operator S is defined in U0\) . 
Proof. First, define the random variables {Y n } and {Y} as follows 



Y n = J2&P + 7)" 1 A P (^ 2 - 1) , Y d ^ £> p + ^~ 1x p( Z p ~ 1) - 
p=i p=i 

where {Z p } p >i are independent standard normal variables. Considering the random element 
h £ 7~C, such that (h, e p )-y = Z p for all p > 1, we may write 



Y n 



Y 






dl,n(£w,j) ■ 



Then, using Eq. (|77p for r = 1 in Lemma [23] with S = Ejy, and Corollary 1131 which gives 
V = Op(n~ 1 / 2 ), we get \Y n — Y\= Op(?i -1 / 2 ), and hence that Y n — ► Y in 



'W 



case 7„ 



HS 



7 



1 1 I 1 

. Next, applying the Polya theorem (jLehmann and Romanol . 120051 . Theorem 



11.2.9) gives the result 



Sup x \F(Y n < x) - ¥(Y < x)\ -► . 



D 



Appendix D. Eigenvalues of covariance operators 

In this section, we give new general results regarding the decay of eigenvalues of covariance 
operators. We assume that we have a bounded density p(x) on W with respect to the 
Lebesgue measure, and a translation invariant kernel k{x — y) with positive integrable 
Fourier transform. In this section, we consider eigenvalues of the second order moment 
operator, which dominates the covariance operator. From the proof of Proposition [TUl 
the eigenvalues of the second order moment operator are the eigenvalues of the following 
operator from L 2 (W) to L 2 (R P ), defined as 



Qf(x)= / p(x) 1 / 2 k(x-y)f(y)p(y) 1 / 2 dy 



We let denote X n (p, K) the eigenvalues of this operator ranked in decreasing order. 

We let denote T(jp) the pointwise multiplication by p, defined from L 2 (MP) to L 2 (W). 
We also denote C(k) the convolution operator by k. We thus get Q = T(p) 1 ' 2 C(k)T(p) 1 ' 2 . 
Note that by taking Fourier transforms (P of p, and K of k), the eigenvalu es are the sam e 
as the one of T(K) l ' 2 C(P)T(K) 1 ' 2 and thus p and K plays equivalent roles (jWidoml . ll963l ). 

The following lemma, taken from lWidoml ()1964l ). gives an upperbound of the eigenvalues 
in the situation where p and K are indicator functions: 

Lemma 25. Let e > 0. Then there exists 5 > such that, if p(x) is the indicator function 
of [—1, 1] and K is the indicator function of [—7, 7], with 7 ^ (1 — e)nir/2, then X n (p, K) ^ 
e~ nS . 
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This result is very useful because it is uniform in 7, as long as 7 ^ (1 — e)mr/2. We 
now take e = |, and we thus get A n (lr_i u, l[-rwr/4,rwr/4l) ^ e _n<5 for some 5 > 0. 

We consider the tail behavior of p(x) and of the Fourier transform K (u>) of fc, through 
M(p,u) = max|| x || oo ^ u p(x) and M(K,v) = max^n^,, -K"(o;), where, for x = (xi,...,x p ), 
\\x\loo = max!<j< p |xj|. We also let denote M (K) and M (p) the supremum of K and p 
over R d . 

Proposition 26. For a// (u, u) such that uv = nir /4, £/ien 

A„(p,K) < M(p,u)M (K) +M(K,v)M (p) + M (K)M (p)e- 5nl/P 

Proof. We divide twice M p in two parts, the spatial version W p = {x, \\x\\oo ^ u} U 
{x, ||x||oo > u} = A u U i?„ and the Fourier version W = {u, \\uj\\oo ^ v } U {^> Halloo > 
i>} = A„UB„. We have for all p and K, 

\ n (p,K) ^ \ n (pl Au ,K) + X^plB^K) 

which is classical results for perturbation of eigenvalues By definition of M(p,u), we have 
T(plg u ) =$ M(p, u)I, and moreover C(k) =4 Mq(K)I, which implies that Xi(j)1b u ,K) ^ 
M(p,u)M (K). We thus get 

X n (p,K) < A„(pU u ,FJ) + M(p,n)M (FJ). 

Similarly, we get 

Xn(p,K) < A„(p1a u ,^1a„) + M(p,n)M (FJ) + M(li»Mo(p). 

We know that if two operators satisfies A =4 B, then A n (^4) ^ X n (B), thus since T(pl Au ) =4 
T(Mq(p)1au) and similarly for K, we get 

X n (p,K) < M (K)M (p)A n (U u ,l A J +M(p,n)M (K) + M(#,v)M (p) 

By a simple change of variable, it easy to show that X n (lA u AA v ) = Ki(^AiAa vu ) When 
p = 1, we immediately have A re (l v 4 1 , lvi„„) ^ e~ . When p > 1, then we notice that 
the eigenfunctions and eigenvalue of the operators will be product of eigenfunctions and 
eigenvectors of the univariate operators. That is, the eigenvalues are of the form \x% x ■ ■ ■ Hi 
where (i±, . . . , i p ) are positive integer and /ij ^ e are eigenvalues of the univariate opera- 
tor. From the product formulation, we get that if n is equal to the number of partitions of a 
certain integer k into p strictly positive integers, then X n ^ e~ . This number of partitions 
is exactly equal to [(p - l)!(p - k)l]~ l (k - 1)! < (fc - If. 

Thus, given any n, we can find an integer k such that (k — l) p ^ n, and we have 
A n (Iai, lAu,,) ^ e"' 5 ' . This leads to X 71 (1a 1 , ^a vu ) ^ e~ 5n P . The proposition follows. □ 

We can now derive a number of corollaries: 

II Il2 

Corollary 27. // p(x) is upper bounded by a constant times e~ a " x " and K(u) is up- 
per bounded by a constant times e~^" w " , then there exists 77 > such that X n (p,K) = 

0(e-" nl/p ). 
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Proof. Take u = v = y/mr/A. □ 

Corollary 28. If p(x) is upper bounded by a constant times (1 + ||x||) _a (with a > p such 
that we have integrability) and K(uj) is upper bounded by a constant times e~^^ , then 
X n (p,K) = 0(^17) for any 77 > 0. 

Proof. Take v proportional to n 11 '®. □ 

Corollary 29. If p(x) is upper bounded by a constant times (1 + ||x||) _a (with a > p such 
that we have integrability) and K(lo) is upper bounded by a constant times (1+ ||x||) - ^ (with 
(3 > p such that we have integrability) , then \ n (p,K) = 0(n~ a ^'^ a+ ^'). 

Proof. Take v proportional to n a ' ( a+ W . □ 
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